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ABSTRACT 



This report analyzes a two-step program designed to 
achieve security in the administration of the English Comprehension 
Level (ECL) test given by the Defense Language institute. Since the 
ECL test score is the basis for major administrative and academic 
decisions, there is great motivation for performing well, and student 
test compromise is prevalent, especially on tests given in the 
students* own country. The best way to combat compromise is to have a 
large number of teat forms. This report first presents an analysis of 
the estimated cost of test compromise. There is a discussion of how 
the problem was handled, and a formula for estimating the cost of 
compromise is given. The second part of the study describes the 
development of conceptual tools and computer programs to enable a 
digital computer to generate valid ECL test-item lists in quantity. 
Details and statistics are provided along with a discussion of the 
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FOREWORD 



This is the Filial Report of work performed at Southwest Research Institute, 850G Culebra 
Road, San Antonio, Texas, under Contract No. F41 609'70-C-0030, for Defense Language Institute, 
English Language Branch, Lackland Air Force Base, Texas. The contract period was 1 May 1970 to 
* 1 May 1971, 

The Technical Monitor was Francis A. Cartier, Ph.D., Chief, Development Division, English 
Language Branch; his supervision and technical advice was most helpful throughout the project. At 
Southwest Research Institute, the Behavioral Sciences Section, Department of Bioengineering, had 
responsibility for the project. The computer programs were developed in the Computer Laboratory 
by Mr. Thomas R. Jackson, manager of that facility, while the cost/benefits study vvas performed 
within the Operations Research nection. Department of Electronic Systems Research, by 
Messrs. Thomas E. Hawkins and Richard A. McCoy; the early conceptual development of the 
approach to the cost/benefiis study benefltted significantly from the contributions of Dr. W. R. 
Brian Caruth, then manager of the Operations Research Section. Mr. Louis S, Berger was Principal 
In%'estigator and Project Manager. 
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1. INTRODUCTION 



To support the Military Assistance and Foreign Military Sales programs of the United States, 
selected foreign military personnel are sent to the United States for technical training in a wide 
variety of skills. Because knowledge and comprehension of English are essential to successful com- 
pletion of this technical training. Defense Language Institute (DLI) has developed a comprehensive 
system for the training of foreign nationals in English, both “in-country” and in the United States 
at DLI English Language Branch, Lackland Air Force Base, San Antonio, Texas. Within this system, 
the English Comprehension Level test (ECL) is the basic tool for measuring a student’s proficiency 
in the language; broadly speaking, a potential student's initial command of the language, as well as 
his academic progress during language training, are measured by ECL tests, 

A number of major administrative and academic decisions are based on a student s ECL 
score. The initial evaluation, usually performed in the applicant’s home country, determines 
whether or not a candidate is ready for training in the United States. Then, should he 
qualify, the following far-reaching programming and scheduling decisions are made on the basis 
of his ECL score: if he is scheduled for language training at English Language Branch, the 
ECL is used to place liim within the curriculum, to measure his academic progress, to predict 
the duration of his English-language training, and to provide a criterion for his graduation 
from language training; in other instances, his ECL score may be high enough to enable him 
to bypass language training in the United States entirely. Thus, considerable resource decisions 
are made on the basis of ECL test score results. 

Viewed from the student’s vantage point, the motivation for performing well on an ECL test, 
be it a screening admissions test overseas or an academic evaluation test at English Language Branch, 
is hiph because of socioeconomic and other personal rewards which derive from satisfactory 
langu'age performance as measured by the ECL test. Consequently, student test compromise is 
prevalent, particularly in screening tests in a student’s home country. The two major avenues to 
compromise are through previous knowledge of test questions, and through exchanging information 
during test administration. Both of these compromise techniques are effectively countered by the 
use of a large number of test forms. Unfortunately, because of the rather complex categonal and 
statistical specifications, assembling an ECL test form is an intricate procedure, and the time 
required for assembling alternate ECL forms by hand has, in the past, limited the number of 
operational ECL forms available to DLL 

The program described in this report was designed to meet the need for ready availability of a 
large number of valid ECL forms; we proposed to develop a computer methodology that would 
construct a very large number of ECL form lists to specifications from a basic test item pool 
provided by the sponsor. Two tasks were proposed: Task I would develop the desired EcL test 
generation ’meihodology; Task II would determine the magnitude of the economic penalties for 
compromise of tue ECL tests, and thereby provide a realistic basis for evaluating further apphca- 
tions of computr^i ^.ochnolcgy to test generation. 

The goal of Phase I, Task I, was to develop a computer methodology which would assemble 
appropriate sets of test items (120 items each), from which the Sponsor could prepare ECL test 
forms The methodology to be developed under Task I should select ECL test item sets which 
would perform at least as well as the sets constituting the operational forms now in use at DLI, 
would be compatible with standard computer systems, and would ultimately be expandable in a 
straightforward way to widsr applications in other DLI programs. 
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The methodology that was developed in the course of this project was thoroughly evaluated in 
two validation studies at English Language Branch. These studies led tlie Sponsor to conclude that 
the mf : hodology is generating valid ECL test item lists and that computer-selected test forms can be 
put into operational use at English Language Branch. We feel that two important conditions existed 
at English Language Branch which contributed vitally to the success of Task I : first, a proven set of 
test items was at hand, together with a well-defined operational taxonomy; second, the insight of 
the staff into test construction science enabled the staff to provide valuable guidance to us dunng 
development of the computer methodology. 

The Task II effort to estimate the economic penalties of compromise was based on a very 
conservative analytical approach. Assumptions were conservative, and where the complexities of a 
situation or the availabihty of data precluded sound analysis, penalty estimates were not incor- 
porated in the computations. The basic data were acquired largely from interaction with English 
Language Branch faculty, supervisory, and management staff, supplemented by available source 
records. Despite the conservative approach, the estimated economic penalties proved to be quite 
large The calculations indicated that the cost of compromise was approximately S76.000 for every 
1000 students entering English Language Branch, and approximately $117,000 for every 1000 
direct entry students (those bypassing language training in the United States). Thus, the annual 
costs of ECL test compromise for the period analyzed were very conservatively estimated to be on 
the order of 1 .25 million dollars. 

In the main body of this report, the Task II effort is reported first (Section II) in order to 
provide a general framework within which the Task I technical discussion (Section III) is then 
presented. 



II. TASK II EFFORT -COST /BENEFITS ANALYSIS 



A. Introduction 

I . Purpose of Task II Research 

The purpose of the research under Task II was to determine the maf.nitude of the 
economic penalty of in-country compromise of the ECL tests, and thereby provide a realistic basis 
for evaluating the cost effectiveness of wider applications of computer technology to ECL test 
generation. The research had to rely on a number of approximations, since the budget for Task II 
was one-fourth of the total project budget. The task was completed within the limitations of this 
modest budget. 



2. Approach to Task II Research 

There are three major subsystems within the overall system of training foreign nationals; 
(1) an English Language Instructor Program which trains foreign nationals at Lackland Air Force 
Base for return to their own country as instructors in the in-country English Language Training 
Program; (2) the in-country English Language Training Program, followed by direct entry into the 
technical training; and (3) the in-country English Language Training Program followed by inter- 
mediate training in General and/or Specialized English at Lackland Air Force Base prior to gradua- 
tion to technical training. Two of these subsystems were considered in this research effort. The 
English Language Instructor Program was considered by the senior personnel of English Language 
Branch (PI )* to be relatively free of ECL test compromise, and its consideration was excluded from 
the analysis of the magnitude of the economic penalty from compromise. Both of the other major 
subsystems (intermediate training and direct entry) were considered in the analysis. 

There were two other factors which were excluded from the analysis (PI). First, it 
is accepted by the senior personnel of English Language Branch that there occurs some testing 
compromise while students undergo intermediate English language training at Lackland Air 
Force Base, but it was considered to be minimal and under effective control. Therefore, the 
research of Task II was limited to an analysis of the economic penalty stemming from test 
compromise of ECL’s given in-country. Second, if cheating on ECL tests given in-country were 
reduced and/or eliminated, one anticipated result would be to increase the scope and cost of 
the in-country training programs. However, as the cost of this training is borne by the host 
countries, except for certain countries in Southeast Asia, it was considered by the senior 
personnel of English Language Branch, Defense Language Institute, to be outside the scope of 
this research effort. 

It became clear early in the research effort that there was no direct way to measure 
the economic penalty of ECL test compromise, that there was a paucity of historical source 
data concerning the direct entry program, and that there was a large amount of historical 
source data concerning the intermediate training program conducted at Lackland Air Force 
Base. Based upon these early findings, three guidelines were developed at SwRI and approved 
by the senior personnel of English Language Branch for the research effort (PI). First, 
because of the necessity for indirect measurement of economic penalties, the techniques used 
for the definition of test compromise and for the translation of the penalties into economic 
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terms should be conservative. Second, because of the availability of source data, emphasis 
should be given to the analysis of the intermediate English Language Training Program con= 
ducted by English Language Branch at Lackland Air Force Base. And, third, sampling techniques 
should be used for the analysis of large numbers of historical records. These three guidelines were 
followed in the research of Task II. 

The results of the analysis of the intermediate English Language Training Program are 
contained in Part B, while Part C contains the results of the analysis of the direct entry program. 
Part D contains a summary of the magnitude of the economic penalties being paid by the U.S: 
Government and several of its agencies for in-country compromise of the ECL tests. Parts E and F 
contain a list of personal interviews and telephone conversations, and a listing of source documents, 
respectively. 

B. Analysis of the Cost of In-Counhry ECL Test Compromise for the Intermediate English 

Language Training Program 

1 . General 

There are four training programs presently being conducted by English Language Branch 
at Lackland Air Force Base: (1) General English Training; (2) Speciali2ed English Training; 
(3) General and Specialized English Training; and (4) I & M Training specifically designed for 
personnel from Viet Nam. Based upon personal interviews with the Section Chiefs responsible for 
each of these programs (P2, P3, P4), it was concluded that the training programs were similar 
enough to permit the development of a generalized scheme for the assessment of the economic 
penalty stemming from in-country ECL test compromise. Each of the training programs must 
accommodate itself to the fact that many students arriving in the United States score significantly 
lower on their entry ECL tests than they did on their final ECL test in»country. A certain amount 
of this drop in scores can be attributed to the time factor between tests, but experience has shown 
that these time-factor related drops are generally compensated for by a rapid rise in ECL scores 
once a student resumes couj’se work at ELB. Therefore, significant drops without a rapid increase 
can logically be attributed to in-country test compromise. 



Compromise results in significant reprogramming of course duration at Lackland, 
remedial efforts to attempt to graduate the students on schedule, and administrative burdens of 
considerable magnitude. It also was determined that for each program, the decision-making process 
concerning training duration and graduation is controlled externally to English Language Branch, 
and consideration is given to factors other than final ECL test scores meeting prescribed criteria* In 
other words, the training program at Lackland is conducted in a flexible, rather than a rigid, 
environment. 

Based upon these considerations, it was concluded that the generalized scheme for the 
assessment of the economic penalty must include criteria for judging who did ^d who did not 
cheat, algorithms for translating the operational penalties of additional course time and remedial 
and administrative burdens into monetary terms, and should take into account the fact that ECL 
test scores were not the only factors considered in the decision-making process. It also was con- 
cluded that the generalized scheme must, of necessity, be designed upon the availability of historical 
source data. 



2. Data Availability 



There were available three sources of historical data: (1) "^Quarterly Training Statistics” 
for fiscal years 1967, 1968, 1969, and 1970; (2) ‘'Student Performance Records” for calendar 
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ears 1969 and 1970; and (3) an “I & M Summary” for six training classes (Groups 12 to 17) of 
Vietnamese students. 

a. Quarterly Training Statistics. The Quarterly Training Statistics (D3) provide infor- 
iiation on the number of students by country and sponsoring service, average course length, 
lumber of failures, extensions in course length, and reductions in course length. Preliminary 
eview of this information indicated that it was summarized for a different purpose and in a 
lanner that made it of limited use to the research effort; it did provide the only available data 
in the total number of students passing through the system and was therefore valuable in 
iroviding a data base for extrapolation of the analysis of selected samples. 

b. Student Performance Records. The Student Performance Records (D4) provide com- 
rehensive data on each student passing through the system. Specifically, for each student, there is 
Bcorded data on country of origin, date of entry, date of graduation, scheduled training period, 
ctual training period program, in-country ECL test score, entry ECL test score, biweekly ECL test 
cores, final ECL test score, required ECL test score, remedial action, and disposition. Additionally, 
he original orders, which are filed with the Student Performance Record, provide data on the 
ponsoring service and type of contract (sales or grant). Preliminary review of these forms indicated 
hat, in addition to providing a basis for country and date of entry matrixes, they could be used to 
evelop the extent of reprogramming of training (either reductions or extensions), percent gradu- 
ted/failed, percent meeting required ECL at graduation, and percent within x points of in-country 
!CL at 2-week intervals. Based upon this preliminary review, it was concluded that the Student 
erformance Records were an excellent and the best available source of historical data. 

c. I & M Summary. This summary, which was a computer printout, provided data on 
i-country, entry, entry plus 1 week, and entry plus 2 weeks ECL test scores for the Vietnamese 
tudents in six groups. It also contained historical data on the ECL test forms which had been used 
ar the in-country test. Preliminary review of this summary indicated that it would be extremely 
aluable in the development of criteria for in-country cheating. Fortunately, the six groups used 
oth old and new forms of the ECL test, and the time span extended to periods before and after 
here were changes made in Viet Nam of the administration of ECL tests. Several personnel of 
inglish Language Branch had indicated that the periods before and after October 1969 should give 
n indication of the extent of the in-country cheating (P2, P3). On the basis of this review, it was 
oncluded that there could be developed criteria to decide who had and who had not cheated on 
te in-coimtry ECL. The rationale for the development of these criteria is described in the following 
sctlon. 



3. Compromise Criteria 



In order to establish a realistic and conservative criterion for who did and who did not 
heat on the in-country ECL tests, consideration was given both to the views of selected personnel 
t English Language Branch, Lackland Air Force Base, as well as to the analysis of the available data 
rom the I &, M students from Viet Nam. 



All three Section Chiefs (P2, P3, P4) at English Language Branch, Lackland Air Force 
ase, indicated that there is a time delay between the final in-country ECL test (which determines 
whether or not a particular student is qualified to move from the in-country training program to the 
itermediate language training program at Lackland) and the entry ECL test. This time may be as 
reat as several months and typically is accompanied by a decrease in ECL test scores at entry, but 
lat decrease is narrowed within 2 weeks by students who had not cheated on their in-country ECL 
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test. One Section Chief (P4) stated that he was convinced that the following would be appropriate 
assumptions for a definition of cheating: 

(1) If the ECL test score 2 weeks after entry into the intermediate program was more 
than 10 points below the in-country ECL, there was a 100-percent probability that 
cheating had occurred on the in-country ECL; 

(2) If the ECL test score 2 weeks after entry into the intermediate program was more 
than 7 points, but less than 10 points, below the in-country ECL, there was some 
probability that cheating had occuired on the in-country ECL* and 

(3) If the ECL test score 2 weeks after entry into the intermediate program was less 
than 7 points below the in-country ECL, there was a zero probability that cheating 
had occurred on the in-country ECL. 

There were available within the data concerning the I & M Groups (D2) the records of 
forty students who were given final in -country ECL tests on versions of the ECL tests so new as to 
make compromise difficult. There were also available the records of several hundred students who 
had taken their final in-country ECL tests on versions of the ECL test which had been in use for 
about a year. It was hypothesized that there would be a significant difference between these two 
groups of students. To prove or disprove this hypothesis, there was made a comparison of the forty 
students from Groups 15-16-17 who had used the new versions of the ECL test form and forty 
students selected at random from Groups 12-13-14 who had used old versions of the ECL test. The 
results of this comparison are shown graphically in Figure 1 . 

On the basis of this analysis, a criterion for classifying students into cheaters or non- 
cheater groups was developed. The criterion would make use of two measures of a student’s ECL 
performance history: the difference between his in-country and entry ECL score and the difference 
between his in-country and entry-plus-2 weeks score. A student would, for the purpose of our 
study, be classified as a cheater if (1) the difference between a student’s in-country ECL score and 
his entry scoi*e was greater than 15, and also (2) the difference between a student’s in-country ECL 
score and his entry plus 2 weeks ECL score had been greater than 10. Unless both these conditions 
were satisfied, a student would be classified as a non-cheater. This criterion was considered con- 
servative and was recommended by the research team and approved for use in the analysis of the 
economic penalties associated with English Language Branch Intermediate training program (P7). 

4. Penalty Measures 

Logically, it appeared that the penalties of compromise v/ere: 

(1) Ultimate failure or poor performance in subsequent technical training; 

(2) Failure at English Language Branch; 

(3) Additional training at English Language Branch; 

(4) Required remedial help while at English Language Branch; and 

(5) Administrative burden, including the penalties of missed quotas and deviations from 
sequential schedules. 
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FIGURE 1 . HYPOTHESIS TESTING OF COMPROMISE CRITERI A 



Of these, the first three could be calculated, but the last two factors could not be 
developed adequately for use in the study. Unquestionably, there are ‘"administrative” and 
“remedial” burdens for poor performers, but no effective technique (short of a management 
audit) could be developed within the limited scope of this task wliich would reveal the magni- 
tude of these burdens. Therefore, these last two penalties were not included in the analysis. 

“Failure at English Language Branch” and “Additional Reprogramming” were factors 
available from the Student Performance Record and readily converted to economic terms. For 
those students attending English Language Branch on a grant basis, the direct costs borne by 
the United States are $60 per man-week tuition; $9 per man=day per diem ($63 per week), 
and $3 to $4 per man-day subsistence ($21 to $28 par week). On the basis of these direct 
costs, which were provided by the Comptroller of English Language Branch, it was concluded 
that a conservative estimate of the cost of reprogrammed training was $150 per man-week. 
This figure admittedly ignores many indirect overhead costs borne by the United States and 
the pay of students borne by the host country, but it does meet the criterion of conservative* 
ness. In the case of a student who has cheated to gain entry to English Language Branch #nd 
fails and is sent home, the entire cost of training plus roundtrip transportation is the direct 
penalty and is capable of calculation. The training period penalty is $150 per man-week, and 
the travel costs are documented in a current Ah* Force directive (D9), Again, many costs are 
ignored by this approach, but it is conservative. 

The penalty for poor performance or failure at subsequent technical training proved 
impossible to measure directly (P6), However, because of its potential magnitude, it was 
believed highly desirable to develop relatively indirect techniques for the measurement of this 
penalty. A brief analysis of 10 cheaters and 10 non-cheaters selected at random indicated that 
there was a substantial difference between cheatei^ and non-cheaters; cheaters tended to 
graduate from the training at Lackland with final ECL’s less than the required ECL to a 
greater degree than did non-cheaters. Based upon this finding, it was concluded that one 
indirect measurement of the penalty during technical training could be made by calculating 
the amount of additional training which would have been required to elevate his ECL from 
the final to the required, but which was not done because external factors dictated the gradu- 
ation of the student. This technique admittedly is most indirect, but it does give some mea- 
sure of the penalty during technical training and, if it errs, it is on the conservative side. It 
was approved for use in the subsequent analysis (P7), Development of the amount of training 
required was based on estimates of student training time versus achievement rate as a function 
of present ECL (P2), Figure 2 shows a smoothed curve representing the averaged composite 
ramp function which relates ECL test score deficit to additional estimated training time 
required to make up the deficit. Thus, for a stipulated increase in ECL test score, the amount 
of required incremental training can be obtained and converted into an economic penalty by 
applying the average $150 per man-week tfaining cost. 

It was necessary to include one additional factor because of the flexible environment of 
foreign national training. Some students fail who have not cheated; others require greater periods of 
training than originally scheduled, while some require less. In other words, there is a deviation from 
the normal path even if there has been no cheating. This was termed system variation and provisions 
were made for its inclusion in the analytical logic. An estimate of this variation was obtained from 
calculations of the above-mentioned three penalties (failure at English Language Branch, repro= 
gramming, and graduation at less than required ECL) for non-cheater groups, since, by assumption, 
the penalties for these groups were caused by system variation only* 
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FIGURE 2. INCREMENTAL TRAINING DATA 
5 . Analytic Logic and Approach 

For the purposes of analysis, it was found desirable to develop the penalty algorithms 
into a computer program which solved the penalty expression. 

Peg “Pf "I" Prt "I" Pa Psv 



where 

Peg 

Pf 
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$ Penalty for the compromise group 
$ Penalty for failure at English Language Branch 
$ Penalty for reprogrammed training 
$ Penalty for graduation at less than required SCL 
$ Penalty for system variation 
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The input data source, input data format, computer program flow diagram, computer program 
listing, and computer program variable array keys were made available to the Technical Monitor. 

It was not considered necessa ^ to analyze the entire population of enrollees at English 
Language Branch; yet, in the interests of developing a defensible analysis, a large sample size was 
considered necessary. The total population enrolling at English Language Branch has been: 
FY67-2102; FY68-2534; FY69-2533; FY70-3823. It was decided (P7) that 50 percent of the 
students from the twelve countries sending the largest number of students during calendar year 
1969 would be an adequate sample for extrapolation. This resulted in a sample size of 1032, split as 
evenly as source data permitted into equal samples for each of four quarters. The sampling had to 
be modified in the case of those countries for which there were not available historical records for 
the last quarters of the calendar year. In these instances, the third and fourth quarters of 1968 were 
substituted. In all cases, the data were chosen to run consecutively for four quarters for each 
country. The sample used in the computer analysis is shown in Table L 

6. Results of the Analysis of the Sample Population 

The analysis of the sample population data was conducted on an individual basis for each 
of the twelve countries comprising the sample. 

The output provides data on two groups of students for each country— those students, 
according to the compromise criteria, who did and who did not cheat in their in-country ECL test. 
Each of these groups was further subdivided into students who failed their English Language Branch 
training, those who graduated with less than the required ECL, and those who graduated at or 



TABLE L POPULATION SAMPLE USED FOR ENGLISH LANGUAGE 
BRANCH COMPROMISE ANALYSIS 





Third 

Quarter 

1968 


Fourth 

Quarter 

1968 


First 

Quarter 

1969 


Second 

Quarter 

1969 


Third 

Quarter 

1969 


Fourdi 

Quarter 

1969 


Total 


Germany 




10 


12 


12 


2 




36 


Iran 




60 


55 


60 


60 




235 


Israel 






30 


32 


28 




90 


Korea 




35 


35 


35 


16 




121 


Laos 


12 


12 


12 


12 






48 


Libya 


13 


12 


11^ 


12 






48 


Morocco 






15 


15 


15 




45 


Saudi Arabia 




8 


12 


7 


6 




33 


Spain 






12 


15 


12 




39 


Thailand 




16 


15 


16 


13 




60 


Turkey 




9 


11 


7 


12 




39 


Viet Nam 




20 


58 


60 


60 


40 


238 
















1032 


*One itudcmt had taming interrupted. 



TABLE 11. DATA FOR PENALTY CALCULATIONS 





Compromise 

Subgroup 


Nan=Compromise 

Subgroup 


Subgroup Size 
Failed 

Graduated less than Required ECL 
Graduated at/above Required ECL 
Penalty/Student 


14.15% 

1.37% 

62.33% 

36.30% 

S766.83 


85.76% 

0.68% 

26.55% 

72.77% 

$232.22 


Cost of Compromise/Student Cheating 


$ 


534.61 




Cost of Compromise for Sample 


$77,792.00 





greater than the required ECL, for each of the four quarters considered in the analysis. Finally, a 
composite tabulation was prepared showing the total results for all countries. The complete com- 
puter printout was transmitted to the Technical Monitor. 

Highlights from the computer summary tabulation are shown in Table II. 

The relative performance of the cheaters versus non-cheaters indicates the detrimental 
effect which compromise of the in-country ECL has on the efficiency of the instruction program at 
English Language Branch. 

In addition to this analysis, the time variability of compromise was investigated. It had 
been hypothesized that there would be a consistent trend, but this was not evident from the 
analysis, the results of which are shown in Table III. 

An additional analysis was made on a monthly basis for the students from Viet Nam, this 
being the only high -proportion cheating group for which there existed a sufficient number of 
records. The computer result of this analysis was transmitted to the Technical Monitor; this addi- 
tional analysis also failed to show a consistent time trend. 



TABLE III. TIME TREND COMPROMISE DATA 





Number 


Number Non- 


Total 


Percent 




Compromise 


Compromise 


Number 


Compromise 


Third Quarter— 1968 


4 


21 


25 


16,0 


Fourth Quarter “1968 


31 


151 


182 


17.0 


First Quarter— 1969 


31 


246 


277 


11.2 


Second Quarter— 1969 


30 


253 


283 


10.6 


Third Quarter— 1969 


28 


196 


224 


12.5 


Fourth Quarter— 1969 


22 


1? 


40 


55.0 


Totals 


146 


885 


1031 




(Student training interrupted) 


■f 1 




Total Sample Size 


1032 
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C. Analysis of the Cost of In-Country ECL Test Compromise for the Direct-Entry Program 

1 . General 

It was planned at the outset of this research that an analysis similar to that performed for 
English Language Branch students would be accomplished for students entering technical training 
directly from their native country. Results of the direct-entry analysis could be compared with 
those from English Language Branch analysis. Given data similar to that employed for English 
Language Branch analysis, the same criterion could be applied as a test for compromise and the 
same penalties could be applied for student ECL’s which did not meet required ECL’s. Students 
who failed technical training and/or were rescheduled for additional English language schooling 
could be assign J monetary penalties analogous to those applied in similar English Language Branch 
cases. 



However, it became apparent early in the research effort that not all of the input informa- 
tion necessary to accomplish such an analysis was available in a centralized location, if it were 
available at all. Through discussions with the Air Force Air Training Command (P8), it was learned 
that the Air Training Command maintains no centralized records on direct-entry students and, in 
fact, has no capability at any ^raining command for English Comprehension Level testing. Further, 
these discussions revealed that the Army, which does have a capability for testing English Compre- 
hension Levels, does not maintain a centralized file of records. In view of these findings, it was clear 
that the analysis of the direct-entry program could be severely limited by the availability of data. 



2. Data Availability 

Examination of the files of English Language Branch disclosed the existence of a si - 
marized report (Dl) on direct-entry students from fifteen Army Training Commands. Review of 
report indicated that only a limited analysis could be accomplished from the summary report, 
although the source data had contained almost all of the data required for an analysis similar to thrt 
performed for English Language Branch training program. Unfortunately, this original source dal^i 
had not been retained at English Language Branch, nor was it available at the parent commands of 
Defense Language Institute (P8, P9), Because of this lack of source data, two alternatives existed for 
the analysis of the penalty from in-country ECL test compromise for the direct-entry program: 

(1) To collect the original records from the fifteen army commands as was ^ one for the 
previous study, or 

(2) To perform only a limited analysis of the available summary data resulting from the 
previous study. 

Because the fii t alternative was beyond the scope of the research effort, the second alternative was 
chosen for the analysis of the direct^ntry program (PIO). 
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The summarized report provided the following data on 989 direct-entry students; 

(1) The total number of students by country; 

(2) The average in-country ECL scores; 

(3) The average entry ECL scores; 
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(4) The average difference between in-country and required ECL scores. 



(5) The average difference between entry and required ECL scores; 

(6) The average difference between in-country and entry ECL scores; 

(7) A tabulation of the number of students whose in-country ECL did not meet the 
required ECL and the number of students whose entry ECL did not meet the required 

ECL; and 

(8) A tabulation of the number of students whose in-country ECL was at least 18 points 
greater than entry ECL and the number of students whose entry ECL was at least 
18 points greater than their in-country ECL. 



3. Compromise and Penality Criteria 

Because the entry plus 2-week ECL test scores were not listed in the available data, the 
criterion for cheating used previously in English Language Branch analysis could not be applied to 
the direct-entry data. Based primarily upon data availability, it was recommended and approved 
(PIO) that the new criterion for cheating would be a single-decision measure; a student wou d e 
classified as a cheater if he obtained an in-country ECL test score of 18 or more points than the 
entry ECL score. We note that, among English Language Branch candidates in the group using new 
test forms (see Fig. 1 ), and thus presumably representing non-cheaters, not one student had as high 
an in-country /entry score differential as 18 points. In this respect, the single point criterion used for 
direct entry students would appear to be more conservative than the original two-decision point 
criterion- The new criterion tends to underestimate the cheating costs because it uses an 18 ECL 
point score rather than the original 15 ECL point score in-country /entry differential. On the other 
hand with respect to the first criterion, some over estimation of costs may occur with the new 
criterion. If a student had “honestly” qualified in-country, forgotten (or otherwise lost) enough 
EneUsh to score 1 8 points below his in-country at entry, and regained sufficient comprehension level 
to reduce his in-country /entry plus 2-weeks score differential to less than ten, he would have been 
classified as a non-cheater by the criterion applied to English Language Branch students, but would 
now be classified as a cheater under the direct-entry, single-decision point criterion. 

Because the distribution of entry ECL scores of 18 or more points below the in-country 
ECL score was not available from the report summary, a fixed 18-point (minimum) penalty \vas 
assessed to all students in the “18-pomt or higher difference” category, in line with our conservative 
approach. All students classified as cheaters by the single point criterion were assigned penalties by 
the same calculation process used for English Language Branch analysis. Point deficits were con- 
verted to required weeks of additional English language training according to Figure 2, and ther . 
weeks of additional training were assigned a monetary value by multiplying by $150 per week per 
Student. For those students conddered to have compromised their in-country ECL, the term AECL 
was always taken equal to 18 points; therefore, for each student in this category, the resulting 
compromise penalty is equal to $1080. This approach does not compute the penalties associated 
with ECL point differentials in excess of 18. However, we imderstand that these excess points are 
those most quickly gained during English language instruction and this ameliorates the under- 
statement of the penalty (P2, P3, P4). 

O 
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4. Analytical Logic 



Based upon the delta training time formula developed in Figure 2, the cost penalties are 
computed by the following expression: 

(S)($ 1 5Q)(AECL) 

‘’''-(4ECL + 2)^„ 

10 

where 



S No, of cheaters 

For students who compromised their in-country ECL, the AECL is by definition, as previously 
discussed, 18 points, and the penalty per student is $1080. 

5. Results of the Analysis of the Sample Population 

The results of the analysis are shown in Table IV. 



TABLE IV. DIRECT ENTRY PENALTY ANALYSIS 



Country 


Number of 
Students 


Number 

Compromised 


Percent 

Compromised 


Total 

Compromise 

Cost 

Penalty 


Average 

Compromise 

Cost 

Penalty 


Argentina 


a 


1 


12.5 


$ 1 ,080 


$135 


Brazil 


15 


1 


6.6 


1,080 


72 


Colombia 


11 


3 


273 


3,240 


294 


Ethiopia 


49 


10 


20.4 


10,800 


220 


Greece 


77 


1 


1.4 


1,080 


14 


Guatemda 


7 


3 


42.8 


3,240 


463 


Iran 


76 


5 


6.6 


5,400 


71 


Italy 


8 


2 


25.0 


2,160 


270 


Jordan 


31 


3 


9.7 


3,240 


104 


Korea 


114 


6 


5.3 


6,480 


57 


Lebanon 


17 


1 


5.9 


1,080 


64 


Liberia 


14 


1 


7.1 


1,080 


77 


Thailand 


157 


26 


16.6 


28,080 


178 


Turkey 


28 


1 


3.6 


1,080 


38 


Viet Nam 


240 


41 


18.6 


44,280 


201 


Total 


989 


105 


10,6 


$113,400 


$117 
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These findings, which 
were based upon the assumptions de- 
scribed in Section C3, are reasonably 



TABLE V. COMPARISON OF ESTIMATED COMPROMISE 
FOR DIRECT -ENTRY AND ENGLISH LANGUAGE 
BRANCH STUDENTS 



consistent with the finding of the 
English Language Branch analysis, as 
shown in Table V, 



D. Summary of Penalty Estimates 

Extrapolation of the English 
Language Branch penalties to past 
enrollment over the last 4 years 
results in the listing of Table VI. 

Extrapolation of the direct- 
entry analysis to the total program 
results in a cost penalty of $ 1 17 ,000 
for every 1 000 students. Based upon 
mately 8000 students in the direct-entry program, and this program level converts to a penalty of 

almost $1,000,000 per year. 





Percent Compromised 


En^ish Language 
Branch Analysis 


Direct-Entry 

Analysis 


Total 


14T5 


10.6 


Iran 


0.85 


6.6 


Korea 


23.97 


53 


Laos 


27.08 


Zero 


Libya 


10,42 


Zero 


Thailand 


2333 


16.6 


Turkey 


12.82 


3,6 


Viet Nam 


27.31 


18,6 


$ Penalty Per Student Enrolled 


$75.63 


$117.00 



information available to the research team, there were approxi- 



TABLE VI. ENGLISH LANGUAGE BRANCH PENALTIES 





Fiscal Year 
1967 


Fiscal Year 
1968 


Fiscal Year 
1969 


Fiscal Year 
1970 


Four-Year 

Total 


Number of Students 
Compromised* 

Compromise Costf 


294 

$157,000 


355 

$189,000 


355 

$189,000 


535 

$286,000 


1539 

$821,000 


*Based on the average (1 
f Based on the average pe 


4%) compromise in the in^countty ECL test. 

malty cost ($534.61) per student classified as ‘-compromiser,” 



E, Personal Interviews and Telephone Contacts 

PI Project Team Meeting with Senior Personnel of the English Language Branch, Defense 
Language Institute, Lackland Air Force Base, San Antonio, Texas, May 1 1, 1970. 

P2 Project Team Meeting with Chief, General English Section, English Language Branch, 
Defense Language Institute, Lackland Air Force Base, San Antonio, Texas, May 13, 1970 
and July 24, 1970. 

P3 Project Team Meeting with Chief, Specialized English Section, English Language Branch, 
Defense Language Institute, Lackland Air Force Base, San Antonio, Texas, May 13, 1970. 

P4 Project Team Meeting with Chief, I & M Section, English Language Branch, Defense 
Language Institute, Lackland Air Force Base, San Antonio, Texas, May 13, 1970. 

PS Project Team Meeting with Adjutant, English Language Branch, Defense Language Insti- 
tute, Lackland Air Force Base, San Antonio, Texas, May 13, 1970. 



P6 Project Team Teiephone Conversation with Deputy Commander, Air Force Air Training 
Command, Randolph Air Force Base, San Antonio, Texas, May 15, 1970. 

P7 Project Team Meeting with Chief, Tests ani Measurements Branch, English Language 
Branch, Defense Language Institute, Lackland Air Force Base, San Antonio, Texas, July 
24, 1970. 

P8 Project Team Telephone Conversation with DCS/OPS, Defense Language Institute, Wash- 
ington, D.C., July 24, 1970. 

P9 Project Team Telephone Conversation with Headquarters, USCONARC, Fort Monroe, 
Virginia, July 29, 1970. 

PIO Project Team Meeting with Chief, Development Dmsion, Chief, Tests and Measurements 
Branch, English Language Branch, Defense Language Institute, Lackland Air Force Base, 
San Antonio, Texas, August 31, 1970. 
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III. TASK I EFFORT-COMPUTER'GENERATED TESTS (CGTs) 



Overview 



The overall objective of Task I was to develop the necessary conceptual tools and computer 
programs to enable a digital computer to generate valid ECL test item lists in quantity. To accom- 
plish this objective, we acquired from English Language Branch an initial set of test items derived 
from 46 operational ECL test forms, oiganized these test and item data for computer acquisition, 
and stored the data. We next developed three auxiliary programs to analyze various aspects of the 
acquired data and, using these programs, investigated the characteristics of the stored items and 
DLIEL-ECL forms. 

An ECL test generation computer methodology was defined. A program was written, and 
prototype ECL test item lists were generated by the computer and transmitted to the Sponsor, At 
EngUsh Language Branch, ECL test forms were typed and produced from the CGT lists, and 
evaluated by the Sponsor’s usual validation methods. In addition, an ‘ update program was 
developed at SwRI which would, on demand, add, delete, update, or correct data pool test items 
and prepare a report documenting and analyzing the update operations. The major computer 
programs, along with other appropriate 
documentation and supplementary de- 
scriptive textual materials, were de- 
livered to the Project Technical Monitor. 



TABLE VII. ECL ITEM POOL PARTITIONS, 
CATEGORIES, AND CODES 



B. Data Acquisition 

1 . Categorization 

English Language Branch uses 
a qmte complex scheme for categorizing 
the test items and defining test specifica- 
tions. After a thorough study of item 
categorization and test content specifica- 
tions, we concluded that the specifica- 
tions could be conveniently defined and 
dealt with in terms of four separate, 
independent partitions of the total item 
data pool (our universe set). We recall 
that a partition of a universe set is a 
division of that universe into mutally 
exclusive and collectively exhaustive sub- 
sets; that is, a partition divides the 
universe set in such a way that every 
item (member) of the universe belongs 
to one and only one subset of a 
partition. 

Table VII shows the four basic 
partitions, the subsets in each partition, 
and the code symbols assigned to the 



Set Pariillon 


Set Properly Considered 


Subtategorics 


Code 


#1 


Modality of Prusenlation 


Aural Coni prehen si on 


AC 






Reading Comprehension 


RC 


m 


Form of Presentation 


Question 


QU 






Statement 


ST 






Dialogue 


DG 






Completion 


CN 






Underlined 


UN 


#3 


Lexical or Structural 


Vocabulary 


VO 




Subsets 


Idiomatic Expression 


ID 






Comparatives, etc. 


CO 






Modals 


MO 






Prepositions 


PR 






Infinitives 


!N 






Gerunds 


GE 






Participles 


PA 






Verb Form 


VF 






Verb Tense 


VT 






Verb Passive 


VP 






Word Order 


WO 






Complex Sentence 


CS 


m 


Source Reference 


Elementary 


11 




(Book) 




12 








13 - 








14 






Unspecified 


00 1 






Intermediate 


21 








22 








23 








24 2 






Other 


25 2 
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subsets. The categorical properties of any given test item are completely specified by one com- 
bination of four descriptors, one and only one descriptor being ;hosen from each partition. There is 
only one unique set of descriptors which is correct for any one given item, because the categories 
describing an item are uniquely assigned to that item at English Language Branch, 

We note that not all logically possible combinations of category codes are acceptable, 
valid test item descriptors. Some combinations are never used and no current test item is correctly 
described by such combinations. The subject of category combinations will be treated in greater 
detail in later sections of this report (IILCjD^F). 

2. Transmittal Procedure from English Language Braiich 



English Language Branch staff prepared a typewritten listing of test items from available 
DLIEL'ECL forms. The listing contained the following information on each test item: 

• Serial identification number assigned to that item 

• A four-element (four-level) categorical code descriptor 

• Objective 

• Answer key 

• Average value of the ease index and count (a digit showing the number of previous 
test administration sessions from which the index was derived) 

• Average value of the discrimination index and count 

• Transmittal date and 



• Code for the DLIEL-ECL test form from v^^hich the item was obtained. 



The reasons for including a “count” datum with the item index information and the use made of 
that count will be explained in the section which discusses item updating (Section IILF). 

Before proceeding with computer processing of the acquired items^ we checked the 
transmitted data visually to ensure that the serial identification number sequence was consistent, 
that the item category combinations (mentioned in the previous section) were valid, and that there 
were no missing elements in any of the test items. Any apparent item discrepancies were resolved 
through discussions with the English Language Branch staff. The verified item data were entered on 
punched cards at SwRL 

C, Data Pool Analysis 

The initial data pool was assembled from 46 DLIEL-ECL forms containing 5111 test item 
questions. The kinds of data that were furnished to SwRI on each item, and on the ECL forms, have 
been listed in the preceding section. We proceeded to analyze these data in order to identify any 
unique or idiosyncratic characteristics related to ECL test reliability in particular. If we could learn 
more about the characteristics that accounted for the good performance of the DLIEL-ECL forms, 
then we could, by selectively duplicating these important characteristics hope to assemble com- 
puter-generated ECL test (CGT) forms of comparable quality of performance. 
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Because of the large size of the data pool, complexity of the analysis task, and availability of 
the item pool data in punched card format, it was economical and efficient to perform the analyses 
by computer. We used three analysis programs." a “scan or check item data program, a statistical 
program, and a histogram plotting program. 

The “scan” program, for each test, 

(1) Identified and counted the out-of-range items (ease index greater than or equal to 0.94, 
or less than or equal to 0.37), 

(2) Calculated the test mean El and DI (ease and discrimination index), 

(3) Checked for coding misprints in the category descriptors, 

(4) Tabulated the distribution of answere, and 

(5) Counted the category distributions. 

Figure 3 shows a sample printout of this program. 



The statistical program summarized test statistics for El and DI. For each index, statistics were 
calculated for the total test, aural category, and reading category; the program also assembled tables 
of distributions for each statistical subanalysis and printed out an ordered array of the item index 
values. A sample printout is shown in Figure 4. The printout is shown for El and the test as a whole, 
printouts for other breakdowns, for example, for Dl/aural comprehension, are identical in format to 
the sample figure. 

The histogram program printed histograms for the El and DI distributions by test. Figure 5 
shows sample printouts of this program. 

The above three programs were initially applied to 46 DLIEL-ECL tests, as well as to the total 
item pool, treating all 5 1 1 1 items as one large test. These programs were subsequently also useful in 
analyzing the characteristics of our CGT forms. 

The analyses performed by these programs generated a sizable body of quantitative informa- 
tion. We will introduce these data, sometimes in summary form and at other times in detail, as 
needed in the technical discussion that follows. Particularly, the discussion of the next two sections 
will rely frequently on the information furnished by these data analyses. 

D. CGT Methodology 

1 . Approaches 



The major conceptual problems of the Task I effort concerned definition of an effective 
computer test item assembly methodology. While English language Branch requirements imposed 
certain specific and, for the duration of the current project, firm constraints (to be discussed in the 
next section), there remained a significant amount of leeway in the conceptual development of the 
assembly program. Consequently, starting with the premise that the general selection methodology 
would be based on randomization, the computer program development dealt with defining and 
choosing among admissible conceptual alternatives. 
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FIGURE 3a. SAMPLE PRINTOUT OF “CHECK DATA” PROGRAM-ITEM CHECK 
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FIGURE 3b. SAMPLE PRINTOUT OF “CHECK DATA” PROGRAM-CATEGORY DISTRIBUTION 
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FIGURE 4a. SAMPLE PRINTOUT OF STATISTICAL PROGRAM 
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FIGURE 4c. SAMPLE PRINTOUT OF STATISTICAL PROGRAM 
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FIGURE 5a, SAMPLE PRINTOUT OF HISTOGRAM PROGRAM 
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FIGURE 5c. SAMPLE PRINTOUT OF HISTOGRAM PROGRAM 
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FIGURE 5d. SAMPLE PRINTOUT OF HISTOGRAM PROGRAM 



Ideally, when evaluated according to present English Language Branch criteria, a CGT 
form should perform at least as well as the operational DLIEL^CL forms now m use. Therefore, a 
vital question that had to be considered before choosing a CGT methodology was whether or not 
the ECL tests at English Language Branch might possess properties which, although unspecified and 
as yet unidentified, were contributing in important ways to the excellent performance character- 
istics of the DLIEL-ECL forms. In other words, were there any other identifiable ECL test form 
characteristics that would be vital for us to copy in order to obtain CGTs whose performance would 
be at least equivalent to the ECL forms’ performance? 

It was not possible to obtain conclusive answers to these questions from experimental 
investigations for several reasons: controlled studies of the effects on test perfomiance of varying 
the test parameters were beyond the program scope; such studies would also have oeen difficult to 
implement at English Language Branch, since such investigations would place significant additional 
burdens on the staff, affect student schedules, and, by increasing exposure of operational test 
forms, make those forms more vulnerable to compromise. We therefore turned to a study of the 
characteristics of the 46 ECL forms which were the source of our computer data item pool, hoping 
that analysis and study of the test characteristics would provide helpful guidelines for achieving 
good CGTs Using the computer analysis programs discussed in Section III.C, we obtained summary 
data on the performance history of 46 tests; tabulations were obtained for each of the 46 tests, 
showing the number of items in the test, the number of items with an ease index equal to or less 
than 0 37 the number of items with an ease index equal to or greater than 0.94 (at the time of this 
initial analysis, these constituted “acceptable” limits, but these limits were modified later, as will be 
discussed in Sections F and G), the total number of items falling outside the acceptable El range 
the mean ease index and discrimination index, and the type of ease index distribution (obtained 
from inspection of the computer-generated histogram). The test analysis data are summarized m 

Table VIII. 



We had expected that the results of our DLIEL-ECL test analysis would guide us toward 
suitable computer methodologies for assembling ECL forms. The formal content specifications 
defined at English Language Branch would have to be satisfied at any rate, but we expected that the 
analyses would indicate the desirability of certain additional sampling or statistical constramts. 
Contrary to these expectations, our stuc of the DLIEL-ECL forms failed to identify any new 
critical test form characteristics; for all the investigated parameters (as summarized in Table VIII), 
there were significant variations between forms. 

However we knew that these forms perform very well in spite of their apparent dis- 
similarities. We also knew that, in any case, the moderate size of our item pool would not allow 
excessive constraints to be imposed on the assembly procedure; too severe constraints would make 
the generation of complete (120-item) CGT form lists difficult. We therefore chose, after several 
extensive discussions with English Language Branch staff, to generate tests with a constraint 
methodology based only on the current English Language Branch formal content specifications 
Thus, our first set of prototype CGTs would be assembled without any additional conceptual 
constraints on the test generation methodology. We would rely on the statistical properties of the 
item pool and on quasi-random* sampling to achieve satisfactory test form parameter values and 
distributions. We decided to generate a set of CGT prototypes, submit them for a standard evalua- 
tion at EngUsh Language Branch, and, should the evaluation so dictate, to subsequently modify our 
initial approach* 



wandom” to signify thar at any one stage of test Item assetnUly, all items in the data pool have ^ equal chance of ^mg 
• oroaa- 'uhow^r, the hem categories &om which admissible item candidates may be aequued decrease as the assemb y 



use ”quasi«ranaom" to signiiy tnac ax any unc ui 

selected by the program i ; however^ the item categories &om which admissible 
of a given form progresses. This point wiU be discussed in Section D3 
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TABLE VIII. SUMMARY OF DLIEL ECL TEST FORMS 
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2. Constraints and Content Specifications 



It was decided early in the program that a CGT must, at minimum, strictly meet Enghsh 
Language Branch content specifications for a 120-item test. The current content specifications are 
shown in Table IX. 



In order to discover the effect that these specifications have on category sums, we 
compare Tables VII and IX. This comparison shows that the content specifications in some 
instances impose specific numerical sums requirements on subsets of a particular partition, while m 
other cases they impose numerical sums requirements on set intereections between subsets from 
different partitions. First, to explain the four partitions (previously referred to in Section III.B), we 
note, with reference to Table VII, that partition #1 identifies each item as belonging to either the 
AC or RC subcategories; partition #2 identifies each item as belonging to one of five subcategories 
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(QU ST DG CN, UN); partition #3 identifies each item TABLE IX. CONTENT SPEClFICA- 

as belonging to one of 13 categories (VO, ID, CO, MO, PR, TIONS FOR CGT (120-ITEM TEST) 

IN. GE, PA. VF, VT, VP, WO. CS); and partition #4, as 
used at English Language Branch at this time, identifies 
each item as being either from an elementary or inter- 
mediate book. The category membership of an item is 
completely specified by assigning the item four descrip- 
tors. one and only one descriptor from each partition, each 
descriptor showing to which subset of a partition the item 
belongs. The set properties of each test item are therefore 
fully specified by one and one only combination of four 
code symbols. 

We can now interpret the specifications 
shown in Table IX in set terminology. The aural/reading 
requirement is a simple sums requirement on partition 
# 1 . The listening/fqueotions, statements, dialogues) 
requirement is a sums requirement on the intersection 
of one subset from partition#! (AC) with three of the 
five subsets of partition #2 (QU, ST, DG). The voca- 
bulary requirement is a simple sums requirement on a 
subset of partition #3 (VO). The vocabulary/ 

(elementary, intermediate) requirement is a sums 
requirement on the intersection of a subset of parti- 
tion #3 (VO) with the subsets of partition #4 (elementary, 
intermediate). The idioms requirements, as well as the 
requirements on the structural items, are simple sums 
requirements on the remaining subsets of partition #3. 

We note in passing that there are redundancies in the content specitications of Table IX, 
since some of the subset requirements are sufficient to specify certain super-set sums. For exan^le, 
since the form of each listening item (AC) is either a question, statement, or dialogue, the three 
subset sum specifications on AC/(QU, ST, or DG) serve to specify the sum total of the AC items. 
Other redundancies arise for analogous reasons. The redundancies can serve as a numerical check on 
the content specifications. 



Item Type 


Model ECL 
Test 


Listening 


75 Items 


Questions 


30 Items 


Statements 


30 Items 


Dialogues 


15 Items 


Reading 


45 Items 


Vocabulary 


72 Items 


Elementary 


29 Items 


Intermediate 


43 Items 


Idioms 


18 Items 


Structural Items 


30 Items 


Comparative 


1 Item 


Modal 


4 Items 


Preposition 


3 Items 


Infinitive 


2 Items 


Gerund 


2 Items 


Participial 


2 Items 


Verb Form 


3 Items 


Verb Tense 


3 Items 


Verb Passive 


3 Items 


Word Order 


3 Items 


Complicated Sentence. 


4 Items 



The number of distinct subset combinations logically obtainable from the four partitions 
is 260 (2 X 5 X 13 X 2); however, the English Language Branch content specifications define 
requirements for only 23 (and these can be restated in terms of only 18 sums, as will be later seen 
from Figure 6) of the 260 possible combinations. Furthermore, the logically possible number of 
260 subcategory combinations cannot be fully used because certain subset combiriations are not 
used at English Language Branch at this time; for example, since all currently used listening (aural) 
items are lexical (either yocabularj or idiom) items, any combination of item descriptors, including 
listening and structural (non-lexical) subcategories, would, at present, be considered an mvaUd 
category combination. There remain, after removal of the at present “forbidden” categories, 1 42 valid 
categorical subset descriptors. Since only 23 subset sums are firmly specified, this leaves a large number 
of subcategory sums unspecified for each ECL test form. We could have assigned specific sums to these 
sub categories, but, in line with the discussion of the preceding section, we chose to satisfy only the 
sums requirements defined at English Language Branch, allowing fluctuations in the other sub- 
category sums to occur as a consequence of the sampUng procedure and item pool composition. 
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FIGURE 6. CGT COMPUTER PROGRAM NETWORK AND 
BRANCH TOTAL COUNTS 



3. Generation of the First Set 



The main function of the computer test assembly program is to assemble test lorm lists ot 
1 20 items each, each form being constructed to satisfy the content specifications described in the 
preceding section Each set of 120 items is generated independently, in the sense that the computer 
program begins generation of each new form anew, -without memory” of its previous generation 
history. The content specifications are met by using a “pipe flow” structure consisting of a network 
of diverging and converging branches in which each branch has a predetermined total count setting 
which reflects the number of items of a given subset required to satisfy the content specifications. 
As the test form is assembled, a counter in each branch counts the items that have already 
passed through it. When the required total has been reached for that branch, the program 
rejects all items that would have normally been routed through the now-closed branch. 

The program samples the item pool randomly, selects an item, and attempts to pass 
it through the pipe network. Each item’s set of four category descriptors (Reference Table VII) 
uniquely specifies the item’s path through the network. If all the pipes for that item are open, the 
item is accepted for the CGT form, and the program selects another item. On the other hand, if any 
branch of the network is closed to that item, or if the item had already been acquired tor thisjorm, 
the computer program rejects the item and samples the data base to acquire the next item. We see 
that the closing of a branch, by rejecting a certain 'et of candidate items, in e^ect restricts and 
reduced the item pool available for the remainder of the CGT form assembly. The program net- 
works together with the prescribed branch total counts are shown in Figure 6. 

By way of example, suppose we were sampling an “AC QU VOl” item and that 28 
“VOl” items had been previously accepted in this test generation run. Then, assuming that the 
current “AC QU” subtotal was less than 30 and that the candidate item nad not already been 
acquired for the partially assembled test, the item would be accepted. The “VOl” total would be 
increased to 29, closing that branch. (The AC QU total would also be raised by one.) Thereafter, 
any sampled item winch had “VOl” as part of its descriptor would be rejected. 

The computer program, in addition to assembling the desired category totals, keeps a 
count of the number of items rejected and terminates a CGT form assembly program when a 
predetermined number of rejections have occurred. This feature may be used to improve the test 
generation efficiency; it act.; as a safeguard against anomalous conditions which could result in 
costly, unproductive, uncontrolled use of computer time. 

Two formats of hard copy output are available. The first is a detailed listing, showing 
various steps and events in the test generation sequence for SwRTs study and evaluation of the 
program. The second arranges the 120 test items in the format in which it is transmitted to Enghsh 
Language Branch. In the latter format, the items are arranged into a group of 75 aural compre- 
hension items and a group of 45 reading comprehension items and sorted within each group in order 
of their serial (item identification) number. The format shows the item serial number, category ea^e 
index and count, discrimination index and count, answer key, item objective, new test (CGT) 
number, original test (DLIEL-ECL) number*, and the date of item acquisition into the data item 
pool. A sample transmitted output page is shown in Figure 7. The format printed for study and 
evaluation at SwRI is discussed in the next section. 



*This is EngUsh Language Branch’s code number for the DUEL ECL test form in which the item happened to be used when the item 
was entered into the item pool at SwRI. 
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FIGURE 7. SAMPLE CGT PRINTOUT OF TRANSMITTAL FORMAT 



The CGT preparation comprises the following steps; 



(1) The program assembles a test form. 

(2) The form is printed out in the study format. 

(3) The “statistics” analysis program (see Section C) is applied to the form. 

(4) Representative forms are selected for transmission to the Sponsor. 

(5) A punched card output of the selected forms is prepared by computer. 

(6) The punched cards for each CGT form are sequenced and categorized on a sorter to 
order the items into a sequence most convenient to the Sponsor (selected by him), 

(7) The CGT card sequence is printed, and the listing transmitted to the Sponsor. 



4. Computer Program Evaluation and Analysis at SwRI 

Seventeen completed CGTs were assembled in mid-August. A summary of their major 
characteristics is presented in Table X, which follows the same general format as Table Vlll. A 
detailed discussion of the comparisons betv/een the CGTs and DLIEL-ECLs, based on English 
Language Branch test results, will be presented in Section IlI.E. 

On comparing Tables VIII and X, it appears that in several respects the CGT forms are 
more uniform than the ECL test forms. Between CGT forms, there is less variability in the number 
of out-of-index range items and in the distribution of the test mean ease and discrimination indexes. 
Also, all the CGT forms have the same total number of items and identical content distributions 
(per Table VII), since these characteristics are predetermined by the computer program, whereas the 
corresponding aspects of the ECL tests are variable. The greater uniformity of the CGT forms 
reflects the acquisition of items by near-random sampling of the large item pool and the firm 
content constraints imposed by the assembly methodology. 



To aid SwRTs study of the test generation process, each CGT was printed out in a format 
shown in Figure 8. This format supplies a detailed account of the events occurring during a test 
generation. The format provides the following information (reference Figure 8): 




Line (1) shows the heading for the test items (identification number, category code, 
ease and discrimination index, answer key, objective, DLIEL-ECL source, and item 
acquisition date); 

Line (2) shows a typical item acquired for this specific CGT form; 

Line (3) is an intermediate summary printout which appears when one of the 
category sums requirements has been completed with the incorporation of the last 
acquired item; the line shows that, to this point, 82 items had been rejected because 
a sum requirement related to their category membership had already been satisfied, 
a total of 192 items had been sampled, 110 items had been accepted, and the last 
item accepted was item Serial No. 4589, categorical descriptor RC ST CS Book 1 ; 

35 



iO 



TABLE X. SUMMARY OF THE FIRST SET OF OGT FORMS 
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FIGURE 8. SAMPLE CGT PRINTOUT OF STUDY FORMAT 



• Line (4) shows the category sum totals still required at this point (the key explaining 
the 20 positions is found superposed at the bottom of Figure 8), Comparing line 4 
with its later counterpart line 4a, we see that the “QU’’ count has gone from 3 to 0. 
This shows that the last item accepted (Serial No. 3657) completed the “QU” sums 
requirement, 

• Line (5) shows the total number of random selections made before the 120-item test 
was completed; 

• Line (6) shows the total number of elements rejected because their categorical 
requirements had been satisfied at time of their sampling; 

® Line (7) shows the total number of sample duplicates (item already acquired for this 
test form) rejected during the sampling procedure; and 

• Line (8) shows that there were 120 elements in the completed test. 

We chose to terminate a CGT run when the number of elements rejected for not fitting 
category requirements (Line 6, Figure 8) reached 1000. We chose this number because an estimate 
indicated that most computer assembly runs should be completed before 1000 elements were 
rejected, and, thus, the limit should provide a reasonable margin of safety against anomalous long 
runs without terminating too many runs with incomplete tests. In the first group of 25 generated 
test lists, 17 completed tests were generated with rejection totals in the range of 100 to 500 items 
per test. Complete and incomplete tests alike required approximately 4 sec of computer time per 
test form to generate. 



The actual computer costs directly associated with CGT item selections are nominal. If 
we assume that about 25 or more form lists are being assembled at one time, the cost to generate 
each test form list is approximately $2.80, including all processing and hard copy outputs. This is 
the cost, after the CGT computer technology has become operational, of assembling, in the 
sequence preferred by English Language Branch, a listing of 120 items for transmittal to the 
Sponsor. 



The study format furnished information concerning the reasons for not completing some 
of the test assemblies before the 1000th reject item count was reached. It appears that, on occasion, 
a test generation run would encounter difficulties because of certain imbalances in the data pool 
subsets. With an ideal, numerically balanced item pool, the probability of sampling a given category 
would be independent of the subset being sampled. To assure this statistical independence, the 
proportion of items in an intersection subset should be the product of the proportions of the 
relevant supersets. This is the empirical equivalent of multiplying unconditional probabilities to 
achieve statistical independence for conditional probabilities. For example, with reference to 
Figure 6, since the AC/QU set is required to have 30 items or 30/90 = 1/3 of the lexical items per 
test and the ID set must have 18 items or 18/90- 1/5 of the lexical items per test, then, for a 
balanced, ideal pool, the AC/QU/ID subset should contain 1/3 X 1/5 = 1/15 of all the lexical items 
in the data pool. When a given subset has a disproportionately large membership compared to the 
(statistically independent) proportion, then with random sampling that subset will be sam- 
pled disproportionately frequently, and this in turn will cause the branch total counts in certain 
branches to be reached prematurely, closing off those branches and requiring that the remaining 
totals be satisfied from other, numerically deficient, subsets. (The argument can, of course, be 
restated in terms of problems caused by disporportionately small subsets.) Most incomplete tests 
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had accumulated approximately 1 1 5 items at their termination, using 1 000-item rejection cutoff limit. 
A study of these CGTs showed that the failure to generate a complete 1 20-item test was most 
commonly due to certain item pool numerical imbalances; because of these imbalances, the ID and 
AC/DG sums requirements both were among the last to be satisfied. Thus, sometimes only AC/DG/ID 
items could be accepted toward the end of a test generation run. However, since there were in this 
category only 19 members in the whole item pool at the time of thisGGT assembly , the program was 
weighted heavily against finding sufficient items to complete the test run when the previously men- 
tioned conditions existed. Table XI shows the actual versus ideal data subset sizes as of August 1 970. 



One could devise methods for modifying the computer generation program which would 
circumvent incomplete test generation. For example, one could impose constraints on the sampling 
procedure, in addition to English Language Branch’s content specifications, so that the computer 
program would randomly sample within selected independent subsets but acquire predetermined 
totals from each of the 23 subsets identified in the content 
requirements. We recall from Figure 6 that English Lan- 
guage Branch specifications can be met by constraining 
only 1 8 subset sums; we therefore are free, at least in 
theory, to introduce further constraints. However, addi- 
tional constraints, wliile perhaps facilitating test genera- 
tion, could introduce unknown characteristics into the 
CGT forms. These effects would be difficult to assess, 
given the limited amount of pretest experimentation pos- 
sible at English Language Branch. The possibility of intro- 
ducing unknown effects by these or similar computer pro- 
gram modifications seemed undesirable; in addition, the 
occasional generation of an incomplete test, infrequent at 
present, should become even less frequent as new items 
from English Language Branch expand and balance the 
item pool. In any case, the generation of an incomplete 
test in even as many as one out of three runs is a trivial 
matter in terms of time, convenience, and economy. 

Therefore, no specific plans exist at present for dealing 
with it. 



5. System Compatibility of Computer Programs 



All of the computer programs developed during 
this research activity are written in the FORTRAN IV lan- 
guage. They have been checked and run extensively on a 
CDC 6400 system using the SCOPE 3 .2 executive system 
and the RUN compiler. 

The programs were written with the idea that, at 
a future date, they may be transferred to another com- 
puter system. Therefore, the input data forms are con- 
ventional 80-column fixed length records (key to tape data 
transcribers with variable length records were not used); 
the output formats use standard FORTRAN specifications 
(especially nH instead of *....* or the Hollerith 

constants are short in length (1 to 4 characters) so that 



TABLE XI. ITEM POOL COMPOSITION, 
31 AUG, 1970 



a* Lexical Items 


No. 




Set 


No. of Items 




Code 


Actual 


Ideal 


1 


AC 


QU VO 1 


485 


408 


2 


AC 


QU VO 2 


425 


612 


3 


AC 


ST VO 1 


718 


408 


4 


AC 


ST VO 2 


579 


612 


5 


AC 


DG VO 1 


156 


204 


6 


AC 


DG VO 2 


107 


306 


7 


AC 


QU ID 


107 


255 


8 


AC 


ST ID 


310 


255 


9 


AC 


DG ID 


19 


128 


10 


RC 


VO 1 


307 


204 


11 


RC 


VO 2 


243 


306 


12 


RC 


ID 


240 


128 


B, Structural Items 


No. 




Set 


No. of Items 




Code 


Actual 


Ideal 


13 




CO 


201 


43 


14 




MO 


101 


170 


15 




PR 


131 


128 


16 




IN 


80 


85 


17 




GE 


80 


85 


18 




PA 


39 


85 


19 




VF 


170 


128 


20 




VT 


163 


128 


21 




VP 


62 


128 


22 




WO 


127 


128 


23 




CS 


131 


170 




4 



> 



. 1 . 



39 



44 



they may fit the short word size of IBM systems. The numerical calculations are of a statistical 
nature and should not be affected by round-off error on IBM systems; integer arithmetic does not 
exceed 4 digits (also compatible with IBM’s 1/2 word integer). All subroutine returns are standard 
and do not use any or the special features that are unique to the CDC compilers (e.g.^ ENCODE and 
DECODE). EQUIVALENCE and COMMON statements have been set up to avoid conflicts between 
different systems. All output has less than 60 lines per page and less than 132 characters per line. 

All of the programs are short or built in modular pieces so that they might be used on 
systems with small partitions of core storage. 

The above compatibility considerations limit the speed, ease, and flexibility in writing 
programs; however, it is hoped that the chosen approach will facilitate and encourage the wide- 
spread use of these programs, thus offsetting the above-mentioned limitation. 

E. Sponsor’s First CGT Evaluation 

1 . Test Transmittal 

Six sample test lists (CGT 8, 9, 13, 14, 21, and 22) were chosen from the first group of 
17 completed CGTs and delivered on 31 August 1970 to the Sponsor for validation. The six tests 
were selected to give two examples each of the three types of ease index distributions identified in 
Table X (1-Mode, Flat, Bimodal). At the same time, the 46 DLIEL-ECL test forms were screened, 
and three tests with characteristics similar to those of the selected CGT samples were identified. The 
three forms were proposed as criterion tests. Each of the three suggested criterion tests was similar 
to a corresponding pair of CGT sample forms in index means, content, and type of ease index 
distribution. 

2, Validation Results 

Validation procedures were performed at English Language Branch, and an evaluation 
report was transmitted to SwRI on 2 Dec 70. It had not been feasible to use the three ECL forms 
suggested as validation criterion tests: however, for each pair of CGTs, three sets of DLIEL-ECL test 
forms were administered for validation. Each of the three validating ECL test administrations used 
several different ECL forms. 




The results of English Language Branch’s evaluations, in brief, were: 

(1) The CGTs met content specifications very well. (This was to be expected, as the 
computer program insured correct categorical subset totals,) 

(2) There "vere no significant differences between the observed and computer-calculated 
CGT index means. 

(3) There were no significant differences between the index means of the CGT and 
DLIEL-ECL tests. 

(4) The CGT/CGT correlations were higher for two out of three sample pairs than for 
the corresponding ECL/CGT or ECL/ECL pairs. 

(5) The sampling of CGT test items seemed to be well distributed among the 46 ECL 
test forms. 
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( 6 ) Item duplication between CGT test forms was acceptable. 

(7) The numoer of statistically unacceptable items (out of ease or discrimination index 
tolerance range) for each CGT remained substantially constant before and after 
pretesting. 

( 8 ) The CGT reliability indexes were comparable to those of the DLIEL-ECL forms. 

Interpreting the results of the validation analysis is made difficult by the presence of 
several subtle factors whose influence cannot be readily assessed. These include, the unavoidable use 
of several DLIEL-ECL forms for each “single” criterion test administration; the use of index range 
criteria to screen items at English Language Branch in contrast to SvyRTs use of the full unrestricted 
itern pool for the first CGT set; effects on 

item index stability of a recent changeover at TABLE XII. CORRELATION SUMMARY 

English Language Branch to a different for- ^ — — 

mula for calculating the item indexes; and CGTPair# ECL/ECLr ECL/CGT r CGT/CGT r 

possible effects due to the historical grouping ^ 

(influenced by item age) of items in the forms 8 and 9 0.58 0.64 0.72 

of the DLIEL-ECL tests, contrasted with the 0.72 0.65 

chronologically random items assembled in a 
sample CGT form. These factors suggest that 

validation results must be interpreted with 07 g 

care. With that reservation, we present a sum- 
mary of the results of the Sponsor’s correla- 13 and 14 0.74 0,75 0.86 

tions analysis in Table XII. It can be seen 0.77 0.79 

from Table XII that, for each pretesting group 0.72 0.75 

(three DLIEL-ECL and two corresponding 

CGT scores), the three kinds of correlation 0^4 

coefficients (ECL/ECL; ECL/CGT; 

CGT/CGT) are comparable. As has been 21and22 0,87 0.85 0.93 

mentioned, two of the CGT/CGT correlation 0.83 0.84 

coefficients are high compared to the other 0.85 0,89 

coefficients in the group, and the Sponsor’s 
analysis showed these to be significant 

(CGT #13 and #14, at the 0.01 level, CGT #21 L_ I 1 ^ — >■ — ' 

and #22, at the 0.05 level). 

It was pointed out by the Project Technical Monitor that the number of out-of-index 
range items in these CGTs is higher than the number found in the current DLIEL-ECL tests (which 
are hand screened after administration for validation purposes). The Technical Monitor suggested 
that remedial measures be developed with the hope that consistent achievement of higher interest 
correlations would result. Since at the same time a large quantity of new ease and discrimination 
index data was furnished by English Language Branch for updating the origina' item pool, the 
development of data base modifications was timely and accordingly implemented as described in 
the next sections. 



TABLE XII. CORRELATION SUMMARY 



CGT Pair # 


ECL/ECL r 


ECL/CGT r 


CGT/CGT r 


8 and 9 


0.58 


0.64 


0.72 




0.72 


0.65 






0.75 


0.75 








0.62 








0.66 








0.78 




13 and 14 


0.74 


0.75 


0.86 




0.77 


0.79 






0.72 


0.75 








0.73 








0.77 








0.74 




21 and 22 


0.87 


0.85 


0.93 




0.83 


0.84 






0.85 


0.89 








0.86 








0.84 








0.86 


.1 



F. Item Pool Updating 

An “update” computer program was developed so that item pool changes, additions, or dele- 
tions can be made. The program is used to keep the CGT item pool status current. 
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When we receive updating information and instructions from English Language Branch, the 
computer program will perform the requested operations and prepare a report (to be discussed 
below). 

The staff at English Language Branch will inspect the report and at an appropriate time 
prepare a new set of updating instructions, repeating the cycle. The CGT program will use the most 
recent item file generated by the update progra.m, unless special instructions to the contrary are 
received. Copies of each new update report will be filed at English Language Branch and SwRI. In 
general, the program performs the following functions on instruction'. 

(1) It changes or corrects existing item data by incorporating new information concerning 
item categories, objectives, answers, and/or index values. 

(2) It deletes items. 

(3) It accepts new items. 

(4) It identifier, items requiring attention. (Input errors; El, or DI out-of-range.) 

(5) It prepares a summary status report on the revised item pool. 

We mentioned in Section B.2 that an ^*index count” datum is included in the item informa- 
tion. The reason for its inclusion is that it is used in revising the item’s El and DI as new history 
accumulates on its performance. Each time a new DI and El is calculated on an item by English 
Language Branch, the calculation is based on administrations to approximately the same size 
student population. To update the cumulative index, it is therefore desirable to weight the new 
index figure in order to ascribe to each test administration the same weight. For example, if an item 
previously had an (accumulated) El of 0.55 and has an El of 0.63 from the current administration 
of that item, we would compute the new El as (0.55N + 0.63)/(N + 1), where N is the old count. 

We currently classify items as out-of-range if the El is< 0.30 or'> 0.93, atid/or if the DI < 0. 
These limits, which are incorporated in the update program, could be adjusted by a trivial change in 
the program should the current index range criteria be revised by the Sponsor. 

To illustrate the -‘Update” program, a sample pool of 33 items has been created to illustrate 
most of the features of the program. Figure 9 is a dump of these items. For purposes of this 
illustration, the “Objective” field is used to comment about the various items (normally, the item 
objective appears in this field). Figure 10 is a listing of the cards used to make an update run on the 
sample pool. The first 1 4 cards are changes to the old pool. The 1 5th card is a new item to be added 
to the pool. Figures 1 1 and 12 are examples of the report format generated by the program. 

Each update card is documented in “Exceptions and Update Report,” (Fig. 1 1). The column 
of item numbers in the center of the report lifts all of the items updated. The column of item 
numbers on the left-hand side indicates all items in the pool that have fields of information that are 
out-of-range or invalid. At the end of the report, we list 5 transaction statistics: the number of 
records read from the input master, the number of records written on the output master, the 
number of new items created, the number of items deleted and, finally, the number of items changed. 

Figure 12 is a one-page report on the content of the entire pool. Counts and percentages are 
given for individual categories, combined categories, and the distribution of the answer keys. Ihe 
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FIGURE II. SAMPLE UPDATE REPORT-ITEM DISPLAY 
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FIGURE 12. SAMPLE UPDATE REPORT-ITEM POOL SUMMARY 



number of items with out-of-range values is summarized and, finally, the pool mean El and Dl are 
given. 

Figure 1 3 shows a dump of the new “Master Item Pool File.” 

G. Second Prototype CGT Set 

Prior to generating the second set of CGTs, we incorporated updating information, furnished 
by English Language Branch, on the ease and discrimination index counts of 3746 data pool items. 
After this revision, it was found that there were 549 out-of-range items in the pool. On instructions 
from the Technical Monitor, we removed all those items from the pool and constructed a new 
updated file whi;h contained 4848 “good” in-range items. The program described in the preceding 
section (F) was used for the updating; the category summary of this revised item pool is shown in 
Figure 14. 

Using the new item pool, the CGT program assembled a second prototype set of 25 ECL test 
form lists. The characteristics shown by the CGT program during this second run substantially 
duplicated the features encountered during the first CGT assembly. Once again, the set of 25 tests 
contained 17 completed CGTs, and the history of test generation, the reasons for not completing all 
tests, and other general features of the computer program performance were substantially identical 
to the characteristics described for the first CGT set in Section D.4. The slight reduction in size of 
the item pool used for the second set (4848 items versus 5111 items in the first 25 CGT data pool) 
did not appear to affect the effectiveness of the program. Since not only the item pool size but also 
its categorical composition remained approximately the same after the final update, the similarities 
of outcome of the two test generation assembly runs seem reasonable. 

The characteristics of the 17 completed CGTs are shown in Table XIII. Comparing Tables XIII 
and X, we note that the major difference is the absence of out-of-index range items in the second 
set of tests. This is, of course, the direct consequence of removing out-of-range items from the item 
pool samples by the Update program. The mean El of the second set of 17 tests was slightly higher 
than that of the first set of 17 tests (0.646 versus 0.626), while the mean DI was essentially the 
same for both sets of 17 tests (0.194 versus 0.192). 

From the second set of test forms lists, six representative CGTs were selected and transmitted 
to English Language Branch on January 5, 1971. These research end items were presented to the 
Sponsor for approval. 

H. Sponsor’s Second CGT Evaluation 

The evaluation study performed at English Language Branch on the second set of CGT tests 
supported the conclusions reported in Section E (sponsor’s first CGT evaluation). It appears that 
the computer program is selecting 120-item sets which generate valid ECL test forms. 

In summary, English Language Branch’s conclusions from the second evaluation were; 

(1) The CGTs precisely met the content specificatic ; of the model ECL tests. 

(2) There were no significant differences between the observed and computer-calculated Cv T 
index means. This implies that the cost of validating CGT forms may be lessened; the test 
index means now may not require recalculation after administration, since the computer- 
calculated index means appear to be acceptable estimators, 
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(3) There were no significant differences between the index means of the CGT and 
DLIEL-ECL tests. 

(4) The CGT/CGT correlat'ons were higher than the corresponding ECL/CGT correlations 
(typically, by 0.05 to 0.09). 

(5) The sampling of CGT test items seemed to be well distributed among the 50 different 
ECL tests. 

(6) There were no unacceptable items before pretesting, since these were removed from the 
item pool. After pretesting, 46 items emerged out-of-range (according to calculations 
based only on these pretesting data) from the 6 CGT prototype forms. 

(7) The CGT reliability indexes were comparable to those of the DLIEL-ECL forms; the 
reliability index (K-R No. 21) for the 6 tests ranged between 0.91 and 0.93. 

On the basis of these results of English Language Branch’s evaluation, it appears that overall 
performance of the CGT forms was improved as a result of the item pool update, winch removed all 
out-of-range items. Further, the evaluation recommended that the 6 forms “be put into operational 
use at DLIEL.” 

I. Other Related Efforts 

A small amount of project time was devoted to acquainting ourselves with relevant techniques 
and developments in the fields of computer-generated tests and analysis of test data. This section 
will summarize these supp! meutary efforts. 

Our Technical Monitor called attention to a computer test generation project conducted by 
the U.S. Army.^*^ In following, up his suggestion, we held several conversations with personnel of 
the U.S. Army Enlisted Evaluation Center, Fort Benjamin Harrison, Indiana (Dr, R. O. Waldkoetter 
and Mr. J. L. Finucane). We also had the opportunity to examine relevant report drafts made 
available, courtesy of these researchers. It was concluded that, while the Army’s project was of 
intrinsic interest, the problems it was addressing were different from our problems in major respects 
so that the MOS Item Bank techniques were iiot directly applicable to our program development at 
this time. However, it is quite possible that further developments at Fort Benjamin Harrison and at 
SwRI may make the cited effort applicable to our program, and we plan to remain in communica- 
tion with the staff of the U.S. Army Enlisted Evaluation Center. 

A second development called to our attention by the Project Monitor was the potential 
application of a “Rasch” model to ECL tests.^^^ The referenced paper was reviewed, but it was 
concluded that a detailed investigation of this methodology would be required before the usefulness 
of the model could be properly established and thai such an investigation was outside of the scope 
of the present program. 

A third area of preliminary investigation concerned the automatic generation of transforma- 
'7.0/ , and charts by computer. At the present time, pretesting of a new ECL form at English 

i l.> .1 laac^.ie, J. L., “Development of Specification for MOS Test Item Bank.” Proeedings of the 11th Annual Conference, Military 
Tasting Association, 1969^ pages 2?“34. 

(2) Moonan, W, J„ "Evaluating Trainee Test Peifomiance By a G. Rasch Measurement Model; A Dialogue,” Paper at the 1 1th MTA 
Conference, 1969. 
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Language Branch is followed by time-consuming hand preparation of a transformation graph which 
is used to convert the new test raw scores to ECL scores. The utility of a computer program using 
polynomial least mean squares approximations was briefly explored and looks promising at this 
time. Figure 15 shows the result of this investigation. The figure shows a scatter diagram for a new 
test and ECL criterion test and also a comparison of the hand-prepared transformation graph and 
the machine-computed second- and fourth-order polynomial approximations. We note the good 
agreement between the graphs in the higher scoring ranges; in this particular pretesting population, 
there were few low scorers, so that the transformation graphs in the lower half of the score range 
tend to be less meaningful. 




FIGURE 15. DLIEL ECL/CGT SCATTERGRAM AND COMPARISON OF HAND 
VERSUS COMPUTER-GENERATED TRANSFORMATION GRAPHS 
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